Identification of Speaker Origin from Transcribed Speech Text
نویسندگان
چکیده
All sentences or passages quoted in this dissertation from other people's work have been specifically acknowledged by clear cross-referencing to author, work and page(s). Any illustrations which are not the work of the author of this dissertation have been used with the explicit permission of the originator and are specifically acknowledged. I understand that failure to do this amount to plagiarism and will be considered grounds for failure in this dissertation and the degree examination as a whole. Abstract Humans have the inherent capability to recognize a speaker's native language with minimal effort from the way he speaks English and can further benefit from the physical characteristics of the speaker. Computationally, this can be an arduous task because of the spontaneous nature of speech and the unavailability to exploit the speaker attributes. This thesis looks into a new problem which we think has not yet been explored, which will deal with the detection of native language of a speaker from his/her pattern of speaking English. These speaking patterns are captured through the text transcriptions of a spoken language corpus. We will exploit the capabilities of a classifier, which will be based on a Multinomial Mathematical Distribution and will work in integration with the n-gram profiling of words, which is tried in specific with uni-grams, bi-grams and tri-grams. We present classification experiments which are based on a logical selection of 4 European and 2 Asian languages with native level American English as the base language. Fairly high degree of accuracy is achieved by the classifier in demarcating English text spoken by various non-native and native speakers of English. The uni-grams scored fairly well as compared to the bi-grams and tri-grams for our selected corpuses. However, the bi-grams show promising results for handling more complex textual variations. Apart from classifying non-native English text we also investigated the inter-language relationships that existed between various language-categories that were explored for the non-native English speakers and further found the language-group that spoke closest to native-level English for the currently selected corpuses.
منابع مشابه
Multimodal Speaker Identification Based on Text and Speech
This paper proposes a novel method for speaker identification based on both speech utterances and their transcribed text. The transcribed text of each speaker’s utterance is processed by the probabilistic latent semantic indexing (PLSI) that offers a powerful means to model each speaker’s vocabulary employing a number of hidden topics, which are closely related to his/her identity, function, or...
متن کاملClosed-Set Speaker Identification Based on a Single Word Utterance: An Evaluation of Alternative Approaches
The problem of closed-set speaker identification based on a single spoken word from a limited vocabulary is relevant to several current and futuristic interactive multimedia applications. In this paper, we evaluate the effectiveness of several potential solutions using an isolated word speech corpus. In addition to evaluating the text-dependent and text-constrained variants of the Gaussian Mixt...
متن کاملGlobalPhone: A Multilingual Text & Speech Database in 20 Languages
This paper describes the advances in the multilingual text and speech database GlobalPhone, a multilingual database of highquality read speech with corresponding transcriptions and pronunciation dictionaries in 20 languages. GlobalPhone was designed to be uniform across languages with respect to the amount of data, speech quality, the collection scenario, the transcription and phone set convent...
متن کاملRecognition Of Voice Using Mel Cepstral Coefficient & Vector Quantization
Human Voice is characteristic for an individual. The ability to recognize the speaker by his/her voice can be a valuable biometric tool with enormous commercial as well as academic potential. Commercially, it can be utilized for ensuring secure access to any system. Academically, it can shed light on the speech processing abilities of the brain as well as speech mechanism. In fact, this feature...
متن کاملA phone-based approach to non-linguistic speech feature identification
In this paper we present a general approach to identifying non-linguistic speech features from the recorded signal using phone-based acoustic likelihoods. The basic idea is to process the unknown speech signal by feature-specific phone model sets in parallel, and to hypothesize the feature value associated with the model set having the highest likelihood. This technique is shown to be effective...
متن کاملRobust text-independent speaker identification using Gaussian mixture speaker models
This paper introduces and motivates the use of Gaussian mixture models (CMM) for robust text-independent speaker identification. The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are efTective for modeling speaker identity. The focus of this work is on applications which require high identification rates using short utterance ...
متن کامل